About me

Outline for Today

Part 1

Visualizing different types of data

Part 2

Principles for making effective plots

Attribution

Ideas are from “Fundamentals of Data Visualization” by Claus Wilke. You should check it out!

Slides are my own.

class: middle, center

data-to-viz


Visualizing Amounts

Use bars


Visualizing Amounts

Use bars. Sensibly rearrange.

In this case: movie order and descending order.


Visualizing Amounts

Bars must go to zero: we interpret area. Don’t do this:


Visualizing Amounts

When zero doesn’t matter, use points:


Visualizing Amounts

x-labels too big? Don’t be afraid to swap axes.


Visualizing Amounts

Sorting, again.


Visualizing Distributions

Want to compare body mass of three penguin species? Please don’t use pinhead plots.


Visualizing Distributions

Plot all the data instead.


Visualizing Distributions

Even better, add some jitter and alpha transparency:


Visualizing Distributions: too much data to show

Too much data to show? Could use boxplots:


Visualizing Distributions: too much data to show

Could also make a histogram for each one.


Visualizing Distributions: too much data to show

Whatever you do, don’t combine and colour:


Visualizing Distributions: too much data to show

Better is to use density plots:


Visualizing Distributions: too much data to show

Even better is to use ridge plots:

Who care’s about the density values, anyway?


Visualizing Distributions: too much data to show

Can pack in many categories (ordered, of course).


Visualizing Distributions: too much data to show

You might even be able to get away with colouring by continent. (arguable)


Activity: Worsen the plot

Code and idea by Firas Moosvi

class: middle, center

Data-to-ink ratio

Less is More


Overlapping Points

How do you know there aren’t overlapping points here? You don’t.


Overlapping Points

Add some transparency, and suddenly you can tell.


Overlapping Points

Or, jitter the points a little bit.


Overlapping Points

When jittering isn’t an option, and alpha transparency isn’t enough?


Overlapping Points

Consider reducing the size of the points:


Overlapping Points

Or, use hexagonal binning (heatmap):


Colour

Don’t try to choose your own colours.


Colour

Leave it to an expert: https://colorbrewer2.org/


Colour

Avoid too many colours


Colour

Are you sure you just don’t want to highlight a few or even one category of interest?


Colour Blindness

Previous colour palette with Protanope (reduction of reds):

(Converted by hclwizard)


Colour Blindness

You could try to accommodate colour blindness and still use colour…

Viridis scale


Colour Blindness

Better yet, don’t rely on colour at all. Facet by species:

Notice the axes are comparable.


Choose an Appropriate Scale

Here, the data are pressed against the y-axis. Tons of whitespace.


Choose an Appropriate Scale

Use a log scale on the x-axis.


Activity: Improve the Plot

How can we make this plot better?

Final Remarks: Making plots

  • All plots today are reproducible.
  • Built with ggplot2 in R.
  • Take a look at Episode 5 of STAT 545 for an intro to ggplot2
  • Non-reproducible (like Excel) not recommended, but will do if that’s all you know.